TITLE by ahmed nagy

Univariate Plots Section

##        X          fixed.acidity   volatile.acidity  citric.acid   
##  Min.   :   1.0   Min.   : 4.60   Min.   :0.1200   Min.   :0.000  
##  1st Qu.: 400.5   1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090  
##  Median : 800.0   Median : 7.90   Median :0.5200   Median :0.260  
##  Mean   : 800.0   Mean   : 8.32   Mean   :0.5278   Mean   :0.271  
##  3rd Qu.:1199.5   3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420  
##  Max.   :1599.0   Max.   :15.90   Max.   :1.5800   Max.   :1.000  
##  residual.sugar     chlorides       free.sulfur.dioxide
##  Min.   : 0.900   Min.   :0.01200   Min.   : 1.00      
##  1st Qu.: 1.900   1st Qu.:0.07000   1st Qu.: 7.00      
##  Median : 2.200   Median :0.07900   Median :14.00      
##  Mean   : 2.539   Mean   :0.08747   Mean   :15.87      
##  3rd Qu.: 2.600   3rd Qu.:0.09000   3rd Qu.:21.00      
##  Max.   :15.500   Max.   :0.61100   Max.   :72.00      
##  total.sulfur.dioxide    density             pH          sulphates     
##  Min.   :  6.00       Min.   :0.9901   Min.   :2.740   Min.   :0.3300  
##  1st Qu.: 22.00       1st Qu.:0.9956   1st Qu.:3.210   1st Qu.:0.5500  
##  Median : 38.00       Median :0.9968   Median :3.310   Median :0.6200  
##  Mean   : 46.47       Mean   :0.9967   Mean   :3.311   Mean   :0.6581  
##  3rd Qu.: 62.00       3rd Qu.:0.9978   3rd Qu.:3.400   3rd Qu.:0.7300  
##  Max.   :289.00       Max.   :1.0037   Max.   :4.010   Max.   :2.0000  
##     alcohol         quality     
##  Min.   : 8.40   Min.   :3.000  
##  1st Qu.: 9.50   1st Qu.:5.000  
##  Median :10.20   Median :6.000  
##  Mean   :10.42   Mean   :5.636  
##  3rd Qu.:11.10   3rd Qu.:6.000  
##  Max.   :14.90   Max.   :8.000

I will explore every variable

this explain that there is a normal distrpution and central skwed

The distribution of Fixed Acidity is positively skewed. The median is around 8 with high concentration of wines with Fixed Acidity

The distribution of Volatile acidity looks like Bimodal with two peaks around 0.4 and 0.6.

the distribution of Citric acid looks strange. Some higher values have no data at all and apart from them, the distribution looks almost rectangular. Maybe there was some error in the data or maybe the data collected was incomplete

A high concentration of wines around 2.2 (the median) with some outliers along the higher ranges.

For Chlorides also, we see a similar distribution like Residual Sugar. We have got rid of extreme outliers in this image.

distribution with very few wines over 60.

As expected, this distribution resembles closely the last one.

The distribution for density has a very normal appearence.

pH also looks normally distributed.

For sulphates we see a distribution similar to the ones of residual.sugar and chlorides.

there is a long tailed distribution in sulfur.dioxide

Univariate Analysis

What is the structure of your dataset?

There are 1599 observation of wines in the dataset with 12 features . There is one categorical variable (quality) and the others are numerical variables that indicate wine physical and chemical properties of the wine.

Other observations: The median quality is 6, which in the given scale (1-10) is a mediocre wine. The better wine in the sample has a score of 8, and the worst has a score of 3.

What is/are the main feature(s) of interest in your dataset?

quality of wines. ### What other features in the dataset do you think will help support your
investigation into your feature(s) of interest? The variables related to acidity (fixed, volatile, citric.acid and pH) might explain some of the variance. I suspect the different acid concentrations might alter the taste of the wine. Also, residual.sugar dictates how sweet a wine is and might also have an influence in taste. ### Did you create any new variables from existing variables in the dataset? no i didn’t create any new variables ### Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

Citric.acid stood out from the other distributions. It had (apart from some outliers) an retangularly looking distribution which given the wine quality distribution seems very unexpected.

Bivariate Plots Section

As we can see, Fixed Acidity has almost no effect on the Quality. The mean and median values of fixed acidity remains almost unchanged with increase in quality.

Volatile acid seems to have a negative impact on the quality of the wine. As volatile acid level goes up, the quality of the wine degrades.

Citric acid seems to have a positive correlation with Wine Quality. Better wines have higher Citric Acid.

that chart explained that the residual sugar has no effect on the quality f the wine

from the previous chart we found that lower percent of Chloride produce better wines.

We see here that too low concentration of Free Sulphur Dioxide produces poor wine and too high concentration results in average wine.

As this is a Subset of Free Sulphur Dioxide, we see a similar pattern here.

it seems that the lower of density produces more quality wine

Better wines seems to have less pH but ther is no big effect on the quality.

Even though we see many outliers in the ‘Average’ quality wine, it seems that better wines have a stronger concentration of Sulphates.

The correlation is really distinct here. It is pretty evident that better wines have higher Alcohol content in it. But we see a great number of outliers here. So it might be possible that alcohol alone does not contribute to a wine being a good quality one. Let’s make a simple linear model and try to get the statistics here.

## 
## Call:
## lm(formula = as.numeric(quality) ~ alcohol, data = red)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8442 -0.4112 -0.1690  0.5166  2.5888 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.87497    0.17471   10.73   <2e-16 ***
## alcohol      0.36084    0.01668   21.64   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7104 on 1597 degrees of freedom
## Multiple R-squared:  0.2267, Adjusted R-squared:  0.2263 
## F-statistic: 468.3 on 1 and 1597 DF,  p-value: < 2.2e-16

Based on the value of R squared, we see that Alcohol alone contributes to only about 22% of the Wine quality. So there must be other variables at play here. I have to figure them out in order to build a better regression model.

So now I will put a correlation test against each variable to the quality of the wine.

##        fixed.acidity     volatile.acidity          citric.acid 
##           0.12405165          -0.39055778           0.22637251 
## log10.residual.sugar      log10.chlordies  free.sulfur.dioxide 
##           0.02353331          -0.17613996          -0.05065606 
## total.sulfur.dioxide              density                   pH 
##          -0.18510029          -0.17491923          -0.05773139 
##      log10.sulphates              alcohol 
##           0.30864193           0.47616632

From the correlation test, it seems that the following variables have a higher correlation to Wine Quality.

  1. Alcohol
  2. Sulphates
  3. Volatile Acidity
  4. Citric Acid

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
there is a good realation between the alchol and the quality of the wine

ther is a negative realation between the volatil acid and the quality lower densities produces good wine ### Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)? the realtionship between the free and total sulfur dioxide almost it seems the totaly connected ### What was the strongest relationship you found? the realtion between the tolal sulfur and the free sulfur

Multivariate Plots Section

lower in volatile acid and higher in alchol produce good quality wine

We can see higher quality wine have higher alcohol and higher citric acid . it seems that more in both of the alchol and sulphates produces more qaulity wine

low volatile acid and high sulphates produces a good wine

low in volatile acid ang high in citric acid produces good qaulity of wine

# Multivariate Analysis

Talk about some of the relationships you observed in this part of the
there is a good relationships between the alchole and the quality of the wine

Were there any interesting or surprising interactions between features?

the sugar has no effect on the qaulity of the wine ### OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.


Final Plots and Summary

Plot One

Description One

This chart revealed how a high in alcohol and lower in volatile.acidity has a big influence on the quality of wines. that is because the alcohol has a postive correlation with the quality but the volatile acidity has a negative correlation with the quality

Plot Two

Description Two

every examination we have done explained that high alcohol and high sulphate concentrations combined seem to produce better wines. because the alcohol has a postive correlation with the quality and the same for the sulphates

Plot Three

Description Three

every examination we have done explained that high in sulphates and higher in citric acid produce a much more high quality wine

as we see in the correlation the sulphates and citric acid has a postive correlation with the quality

Reflection

the biggest challenge that i faced when i started to analys this data base is there is a many variables may be is resbonbile or related to qaulity for wine and i have to determine and predict which the variables is basicaly making affect on the qaulity

so i started to explain every variable alone and see the general shape of the distripution and note if is there any thing is abnormal as i expected i found many variable has no effect on the wine qaulity such as the resudual sugar

so i made a linear correlation to the quality and i found the more 4 factors affected on the quality three of them has a positive correlation :- 1. Alcohol 2. Sulphates 3. Citric Acid and only one has a negative correlation :-

  1. Volatile Acidity

so i have started to make a analysis for thos factors together

In the final part of my analysis, I plotted multivariate plots to see if there were some interesting combinations of variables which together affected the overall quality of the wine so i found that the alcohol has a big affect on the quality and the citric acid

For future analysis, I would love to have a dataset, where apart from the wine quality, a rank is given for that particular wine by 5 different wine tasters as we know when we include the human element, our opinion changes on so many different factors. So by including the human element in my analysis, I would be able to put in that perspective and see a lot of unseen factors which might result in a better or worse wine quality. Having these factors included inside the dataset would result in a different insight altogether in my analysis.